In this first part, we:
You can download this notebook to run it locally.
In [1]:
import pandas
import graphistry
# You can also set your API key once for all in the enviroment variable "GRAPHISTRY_API_KEY".
#graphistry.register(key='<go to www.graphistry.com/api-request to get a key>', server='labs.graphistry.com')
In [3]:
logs = pandas.read_csv('.././../data/honeypot.csv')
logs[:3] # Show the first three rows of the loaded dataframe
Out[3]:
Dates in time(max)
and time(min)
are unix timestamps. Pandas helps parse them.
In [3]:
logs['time(max)'] = pandas.to_datetime(logs['time(max)'], unit='s')
logs['time(min)'] = pandas.to_datetime(logs['time(min)'], unit='s')
logs[:3]
Out[3]:
In [4]:
g = graphistry.bind(source='attackerIP', destination='victimIP').edges(logs)
g.plot()
Out[4]:
We compute desired edge colors by creating a new column (ecolor
) by assigning each vulnerability name to a different color code. We then tell the plotter to override the default edge coloring by binding our data to the attribute edge_color
.
See the list of color codes at https://graphistry.github.io/docs/legacy/api/0.9.2/api.html#extendedpalette
In [5]:
vulnerabilityToColorCode = {vulnName: idx for idx, vulnName in enumerate(logs.vulnName.unique())}
vulnerabilityToColorCode
Out[5]:
In [6]:
edges = logs.copy() # Copy the original data to avoid unintended modifications.
#Set an edge's color to the value in the vulnerability lookup table
edges['ecolor'] = edges.vulnName.map(lambda vulnName: vulnerabilityToColorCode[vulnName])
edges[:3]
Out[6]:
In [7]:
# Finally, add the binding of ecolor to edge colors and plot
g2 = g.bind(edge_color='ecolor')
g.plot(edges)
Out[7]:
To set the size and colors of nodes we need to create a node table where each node is represented by a row.
source
and destination
columns of the edge table. This lists our node identifiers and will be the fist column of the node table.We proceed in a few steps: collect all attacker IPs and color them red, collect all victim IPs and color them yellow, and then concatenate the IPs together into one table.
In [8]:
#Create the table of attackers. Our node identifier column will be called "IP".
attackers = edges.attackerIP.to_frame('IP')
attackers['type'] = 'attacker'
attackers['pcolor'] = 67006 #red
attackers[:3]
Out[8]:
In [9]:
# Sames steps but for victims (destinations)
victims = edges.victimIP.to_frame('IP')
victims['type'] = 'victim'
victims['pcolor'] = 67001 #yellow
victims[:3]
Out[9]:
In [10]:
#Combine the two tables
#If an IP is both an attacker and a victim, prioritize coloring it as an attacker
nodes = pandas.concat([attackers, victims], ignore_index=True).drop_duplicates('IP')
nodes[:4]
Out[10]:
In [11]:
# We can now pass both the edge and node tables to "plot".
g2.bind(node='IP', point_color='pcolor').plot(edges, nodes)
Out[11]:
Within the visualization, you can now filter and drill down into the graph.
For cool results, try to:
victimPort
, vulnName
, and count
. By selecting a region of a histogram or clicking on a bar, you can filter the graph. For instance, we see that though the NetApi vulnerability is the biggest bar and therefore the most common vulnerability. By clicking on its bar and filtering to only those, we see that is only present in the big cluster of attacks again IP 172.31.14.66. (Click again to remove the filter.)In the next part of the tutorial, we show